-
-
Notifications
You must be signed in to change notification settings - Fork 498
Update nanojson to include upstream and FireMasterK's optimizations #1382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
b4d6e15 to
144520a
Compare
|
After making sure nanojson produces JDK 11-compatible binaries, the CI here builds (and all tests pass). |
get() may return objects with types that are used internally in the nanojson library, such as LazyString or Number. Use specialized methods instead. This commit removes all usages (except in SoundcloudParsingHelper::resolveIdWithWidgetApi()). By doing so, timeago-generator now compiles again, and YoutubeChannelExtractor::getTags() and JsonUtils tests pass again. The compilation and tests failed because nanojson now internally uses LazyString instead of plain String.
|
LGTM, but the mock tests were executed slower with these changes included (ca. 300ms additional runtime on average over 10 runs). However, I do not know how the Junit tests are run and how they are different to the extractor usage on mobile devices. |
|
I collected 10 samples from before and after, and I calculated the average and standard deviation. Given that data, we can't conclude which of the two commits is faster, since the two distributions have basically the same mean and high stddev. I don't think the time taken to run tests is a good measure of performance, I'm pretty sure Gradle introduces a lot of overhead to report test results. |
|
Wait I wrote this simple performance test that just tries to open all mock JSON files. It takes 1.8s on 837705a (before) and takes 26.5s on 1f6cb35 (i.e. this PR). This is very strange... Path resourcePath = Paths.get("src", "test", "resources");
var jsonFiles = new ArrayList<>();
for (Path p : Files.walk(resourcePath).collect(Collectors.toList())) {
if (!Files.isRegularFile(p) || !p.toString().endsWith(".json")) {
continue;
}
jsonFiles.add(JsonParser.any().from(new FileInputStream(p.toFile())));
}
assertEquals(808, jsonFiles.size()); |
|
The major efficiency gains will be when the application is running for a long time and Lazy Strings. String deserialization takes time because of the UTF validation, Lazy String which will reduce CPU time to parse strings only when necessary, which is quite useful for us since we don't parse and read all Strings in every JSON Object. I borrowed this idea from the Jackson library. The buffer pool is useful when the application is long-running - it reduces GC thrashing, and the overhead of creating a buffer for parsing JSON each time. The benefits here would be more visible when a lot of JSON is being parsed or the application runs for a long time, so that the buffers are reused instead of recreated - mainly server use-cases like Piped rather than the NewPipe application. The best way to benchmark would be to use a Micro Benchmarking tool like JMH. |
That's very odd, I will take a look later to see what's causing the issue with a profiler later. |
|
@FireMasterK I already discovered why: that set of JSON files contains very long strings and the reusableBuffer's size was growing linearly instead of exponentially |
Ah, that would make sense! The current logic grows the buffer size in increments of 512 as needed. Could you check with FireMasterK/nanojson@aae1500, where I double size instead of growing more conservatively? |
|
@FireMasterK yes that works. I actually used the diff below but it's doing the same thing; I found that 1.3 also works fine as a constant which might save some memory in extreme cases. Diffcommit e890a9e85df905aeb8cff91f43c612c643d37127
Author: Stypox
Date: Mon Oct 6 21:57:52 2025 +0200
Ensure exponential growth of container size
This makes it so that even for long strings the parse time is O(n) amortized instead of O(n²)
diff --git a/src/main/java/com/grack/nanojson/JsonTokener.java b/src/main/java/com/grack/nanojson/JsonTokener.java
index 3fa5807..5e1455a 100644
--- a/src/main/java/com/grack/nanojson/JsonTokener.java
+++ b/src/main/java/com/grack/nanojson/JsonTokener.java
@@ -762,7 +762,16 @@ final class JsonTokener implements Closeable {
private void expandBufferIfNeeded(int size) {
if (reusableBuffer.remaining() < size) {
int oldPos = reusableBuffer.position();
- int increment = Math.max(512, size - reusableBuffer.remaining());
+ int increment = Math.max(
+ // don't reallocate too small parts
+ 512,
+ Math.max(
+ // allocate at least as much as needed
+ size - reusableBuffer.remaining(),
+ // ensure exponential growth so that it's O(n) amortized
+ (int) (reusableBuffer.capacity() * 0.3)
+ )
+ );
CharBuffer newBuffer = CharBuffer.allocate(reusableBuffer.capacity() + increment);
reusableBuffer.flip(); // position -> 0, limit -> oldPos
newBuffer.put(reusableBuffer); // copy all existing data
Anyway, I tried to benchmark the old version and the new version using the data stored in the mocks (which is the kind of data we care about). Unfortunately I didn't get any significant performance improvement, it's basically the same. I didn't use JMH though, as it seems a bit complicated to setup. If you have it at hand and think it would give clearer results I'd appreciate if you could take my code and run it in JMH. CodePath resourcePath = Paths.get("src", "test", "resources");
var files = Files.walk(resourcePath).collect(Collectors.toList());
var jsonSamples = new ArrayList<String>();
for (Path p : files) {
if (!Files.isRegularFile(p) || !p.toString().endsWith(".json")) {
continue;
}
try {
var mockRequest = JsonParser.parseReader(new FileReader(p.toFile()))
.getAsJsonObject()
.getAsJsonObject("response");
if (mockRequest.getAsJsonObject("responseHeaders")
.getAsJsonArray("content-type")
.get(0)
.getAsString()
.contains("json")) {
//jsonSamples.add(Files.readString(p));
jsonSamples.add(mockRequest.get("responseBody").getAsString());
}
} catch (ClassCastException | IllegalArgumentException | NullPointerException
| IllegalStateException ignored) {
}
}
assertEquals(525, jsonSamples.size());
var totaltime = Duration.ZERO;
int N = 20;
Duration first = null;
for (int i = 0; i < N+1; ++i) {
System.gc();
var t0 = Instant.now();
for (String content : jsonSamples) {
com.grack.nanojson.JsonParser.any().from(content);
}
var t1 = Instant.now();
var spent = Duration.between(t0, t1);
System.out.println("This one took " + spent);
if (first == null) {
first = spent;
} else {
totaltime = totaltime.plus(spent);
}
}
System.out.println("Average " + totaltime.dividedBy(N) + " cold start " + first);ResultsAll CPU cores fixed to 2.434GHz, 8GB of RAM given to Java, I did 5 manual runs of the same TeamNewPipe/nanojson@e9d656d (before + changes to upstream): TeamNewPipe/nanojson@bc71b09 (after + exponential buffer growth): TeamNewPipe/nanojson@df185fe (after - Java 11 updates): |
My fork:NewPipe nanojson fork:Pre-update nanojson fork latest:You are right, I'm not sure what's going on, I will need to have a deeper look. |
This PR updates nanojson to the commit from TeamNewPipe/nanojson#27.
I included FireMasterK@165b459 as suggested in TeamNewPipe/nanojson#27 (comment) and had to make some further changes to make all tests pass (in particular
YoutubeStreamExtractor::getTags()was failing).Just to be sure, I went through all of the raw usages of
LinkedHashMap::get()(and similar) (i.e.JsonObject's base class) to make sure that we weren't depending on the types returned by direct raw access to the json object. I couldn't find other problematic places (except for timeago-generator), but I still replaced some raw accesses here and there.@FireMasterK you might be interested in including my changes in your version of the extractor :-)